An E cient Architecture for Loop Based Data Preloading
نویسندگان
چکیده
Cache prefetching with the assistance of an optimizing compiler is an e ective means of reducing the penalty of long memory access time beyond the primary cache. However, cache prefetching can cause cache pollution and its bene t can be unpredictable. A new architectural support for preloading, the preload bu er, is proposed in this paper. Unlike previously proposed methods of nonbinding cache loads, the preload is a binding access to the memory system. The preload bu er is simple in design and predictable in performance. With simple interleaving, accesses to the preload bu er are independent of the access pattern and processor issue rate, and are therefore free of bank con icts. With trace driven simulation, it is shown that the performance from preloading hides memory latency better than no prefetching and cache prefetching. In addition, both the bus tra c rate and the miss rate are reduced.
منابع مشابه
An Eecient Architecture for Loop Based Data Preloading
Cache prefetching with the assistance of an optimizing compiler is an eeective means of reducing the penalty of long memory access time beyond the primary cache. However, cache prefetching can cause cache pollution and its beneet can be unpredictable. A new architectural support for preloading, the preload buuer, is proposed in this paper. Unlike previously proposed methods of non-binding cache...
متن کاملPerformance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64
This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to optimize the performance of SVD (Singular Value Decomposition) benchmark. A performance model for dissecting the total execution cycles is presented. The data preloading using “memcpy” or hand optimized “inline” assembl...
متن کاملDual Phase Detector Based Delay Locked Loop for High Speed Applications
In this paper a new architecture for delay locked loops will be presented. One of problems in phase-frequency detectors (PFD) is static phase offset or reset path delay. The proposed structure decreases the jitter resulted from PFD by switching two PFDs. In this new architecture, a conventional PFD is used before locking of DLL to decrease the amount of phase difference between input and outpu...
متن کاملEecient Program Partitioning Based on Compiler Controlled Communication 1
In this paper, we present an e cient framework for intraprocedural performance based program partitioning for sequential loop nests. Due to the limitations of static dependence analysis especially in the inter-procedural sense, many loop nests are identi ed as sequential but available task parallelism amongst them could be potentially exploited. Since this available parallelism is quite limited...
متن کاملDesign and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)
Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998